Compiler Reduction of Invalidation Traffic in Virtual Shared Memory Systems
نویسندگان
چکیده
This paper presents new compiler analysis for the elimination of invalidation traac in virtual shared memory, using a hybrid distributed invalidation coherence scheme. The invalidation and acknowledgement messages are removed; this reduces both network invalidation traac and the latency of a write fault. It aggressively exploits the SPMD execution model and uses array section analysis to accurately determine only those instances when invalidation is necessary, thus avoiding the additional read misses of previous schemes. Equations determining precisely what data should be invalidated are presented and translated into a form amenable to compiler analysis. Preliminary experimental results on a 30 node prototype architecture demonstrate the performance attainable using this scheme.
منابع مشابه
A compiler algorithm to reduce invalidation latency in virtual shared memory systems
This paper presents a new compiler algorithm to eliminate invalidation traffic in virtual shared memory using a hybrid distributed invalidation scheme. It aggressively exploits static scheduling and data layout to accurately determine only those instances when invalidation is necessary, thus avoiding the additional read misses of previous schemes. Equations determining precisely what data shoul...
متن کاملFast & Cost Effective Cache Invalidation in DSM
Most distributed shared memory systems use point-topoint networks in conjunction with directory-based cache coherence protocols. Cache invalidation transaction generates a number of unicast invalidation messages and as many acknowledgment messages. This results in heavy network traffic, high latency, and high occupancy at home nodes. This paper introduces a fast cache invalidation method, calle...
متن کاملUser-Level VSM Optimization and its Application
This paper describes user-level optimisations for virtual shared memory (VSM) systems and demonstrates performance improvements for three scientiic kernel codes written in Fortran-S and running on a 30 node prototype distributed memory architecture. These optimisations can be applied to all consistency models and directory schemes, whether in hardware or software, which employ an invalidation b...
متن کاملProcessor-Directed Cache Coherence Mechanism – A Performance Study
Cache coherent multiprocessor architecture is widely used in the recent multi-core systems, embedded systems and massively parallel processors. With the ever increasing performance gap between processor and memory, there is a requirement for an optimal cache coherence mechanism in a cache coherent multiprocessor. The conventional directory based cache coherence scheme used in large scale multip...
متن کاملFine Grain Synchronisation in VSM Architectures
This paper presents a new scheme to replace course grain barriers with ne grain synchronisation in virtual shared memory systems. Traditionally, shared memory programming models separate data access from synchronisation. In our scheme synchronisation between both writes and their subsequent reads, and reads and their following writes, is achieved through the coherence tags associated with each ...
متن کامل